library(tidyverse)
library(DT)
library(lubridate)
  1. Load the data into R.
  2. Descriptive Analysis
    2-1.
    2-2.
    2-3.
    and more..
  3. Determine the optimal number of clusters using methods like the Elbow method.
  4. Perform K-means clustering.
  5. Analyze the resulting clusters to interpret different groupings of orders based on acknowledgment times and other relevant factors.

1. Load the data into R.

I’ve loaded already. Let’s take a look at the dataset to get a sense of what we’re working with.

order_late %>%
  DT::datatable(options = list(scrollX = TRUE))

Data Description:

  • profile_owner: The identifier of the individual who owns the profile related to the order.

  • leader_name: The identifier of the leadership or supervisory figure associated with the order or the profile owner.

  • loc: A code or number that represents the location where the order was processed or is to be fulfilled from.

  • order: The unique identifier assigned to the order.

  • customer: The name of the individual or entity to whom the order will be delivered.

  • order_date: The date on which the order was placed or recorded.

  • week_number: The week of the year when the order was placed, which could be useful for seasonal analysis.

  • delivery_date: The date when the order is scheduled to be delivered to the customer.

  • ship_date: The actual date when the order was shipped out from the facility.

  • date_acknowledge: The date on which the order acknowledgment was recorded in the system.

  • date_acknowledgement_calc: Calculated date for when the order was supposed to be acknowledged, possibly used for performance tracking.

  • days_to_acknowledge: The number of days it took to acknowledge the order from the order date, a measure of processing time.

  • on_time: An indicator of whether the order acknowledgment was within the expected time frame, with values like ‘On Time’ = 1 or ’Not on Time = 0

These columns together can provide valuable insights into the order processing efficiency and timeliness. Understanding patterns and relationships within these columns through clustering or other data analysis methods could help in identifying bottlenecks, predicting future performance, and improving overall service delivery.

  1. Descriptive Analysis

2-1. Summary Statistics

order_late %>% dplyr::summarise(
  Mean = mean(days_to_acknowledge, na.rm = TRUE),
  Median = median(days_to_acknowledge, na.rm = TRUE),
  Min = min(days_to_acknowledge, na.rm = TRUE),
  Max = max(days_to_acknowledge, na.rm = TRUE),
  SD = sd(days_to_acknowledge, na.rm = TRUE)
)

2-2. Distribution of Days to Acknowledge

order_late %>% 
  ggplot(aes(x = days_to_acknowledge)) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Days to Acknowledge",
       x = "Days to Acknowledge",
       y = "Frequency") +
  theme_minimal()

2-3. Distribution of Days to Acknowledge by Profile Owner

order_late %>% 
  ggplot(aes(x = days_to_acknowledge)) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Days to Acknowledge by Profile Owner",
       x = "Days to Acknowledge",
       y = "Frequency") +
  facet_wrap(~profile_owner) +
  theme_minimal()

2-4. Distribution of Days to Acknowledge by Location

order_late %>% 
  ggplot(aes(x = days_to_acknowledge)) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Days to Acknowledge by Location",
       x = "Days to Acknowledge",
       y = "Frequency") +
  facet_wrap(~loc) +
  theme_minimal()

2-5. Distribution of Days to Acknowledge by Leader

order_late %>% 
  ggplot(aes(x = days_to_acknowledge)) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Days to Acknowledge by Leader",
       x = "Days to Acknowledge",
       y = "Frequency") +
  facet_wrap(~leader_name) +
  theme_minimal()

2-6. Distribution of Days to Acknowledge by Week Number

order_late %>% 
  ggplot(aes(x = days_to_acknowledge)) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Days to Acknowledge by Week Number",
       x = "Days to Acknowledge",
       y = "Frequency") +
  facet_wrap(~week_number) +
  theme_minimal()

2-7. Summary Statistics by On Time

order_late %>%
  group_by(on_time) %>%
  summarise(
    Mean_days_to_acknowledge = mean(days_to_acknowledge, na.rm = TRUE),
    Median_days_to_acknowledge = median(days_to_acknowledge, na.rm = TRUE),
    SD_days_to_acknowledge = sd(days_to_acknowledge, na.rm = TRUE),
    Min_days_to_acknowledge = min(days_to_acknowledge, na.rm = TRUE),
    Max_days_to_acknowledge = max(days_to_acknowledge, na.rm = TRUE),
    Count = n()
  )

2-8. Distribution of Days to Acknowledge by On Time

order_late %>% 
  ggplot(aes(x = days_to_acknowledge)) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Days to Acknowledge by On Time",
       x = "Days to Acknowledge",
       y = "Frequency") +
  facet_wrap(~on_time) +
  theme_minimal()

Next: I need to change the descriptive analysis page according to the new data. (Shiny dashboard need to change)